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Abstract 

We give a reduction from clique to establish that sparse PCA is NP-hard. The reduction 
has a gap which we use to exclude an FPTAS for sparse PCA (unless P=NP). Under weaker 
complexity assumptions, we also exclude polynomial constant-factor approximation algorithms. 


1 Introduction 

The earliest reference to principal components analysis (PCA) is in |14j . Since then, PCA has 
evolved into a classic tool for data analysis. A challenge for the interpretation of the principal 
components (or factors) is that they can be linear combinations of all the original variables. When 
the original variables have direct physical significance (e.g. genes in biological applications or assets 
in financial applications) it is desirable to have factors which have loadings on only a small number 
of the original variables. These interpretable factors are sparse principal components (spca). There 
are many heuristics for obtaining sparse factors p [El [ig El El [HI [16] as well as some approximation 
algorithms with provable guarantees [2] . Our goal in this short paper is to establish the NP-hardness 
and inapproximability of SPCA using a reduction from clique. 

The traditional formulation of sparse PCA is as cardinality constrained variance maximization: 

Problem: SPCA (sparse PCA) 

Input: Symmetric matrix S G sparsity r > 0; variance M > 0. 

Question: Does there exist a unit vector v G M” with at most r non¬ 
zero elements (v^v = 1 and ||v||q < r) for which v'^Sv > M? 

In the machine learning context, S is the covariance matrix for the data and, when there is no 
sparsity constraint, the solution v* is the top right singular vector of S. A generalization of SPCA 
is the generalized eigenvalue problem: maximize v^Sv subject to v^Qv = 1 and ||v||q < r. This 
generalized eigenvalue problem is NP-hard m (via a reduction from sparse regression which is 
known to be NP-hard [131 17]!. It is deeply embeded folklore that SPCA is NP-hard. The importance 
of sparse factors in dimensionality reduction has been recognized in some early work (the varimax 
criterion [9| has been used to rotate the factors to encourage sparsity, and this has been used in 
multi-dimensional scaling approaches to dimensionality reduction [13 do]). 

Notatiou. A,B,... are matrices; a, b, ... are vectors; and, are graphs. The top 

eigenvalue of a matrix A is Ai(A); ||A ||2 is the spectral norm. For an undirected graph G, its 
adjacency matrix A is a (0,l)-matrix with Ajj = 1 whenever edge (i, j) is in G. The spectral radius 
of a graph is the spectral norm of its adjacency matrix (also the top eigenvalue Ai). 0 (resp. 1) are 
vectors or matrices of only zeros (resp. ones); for example, 12 x 2 is a 2 x 2 matrix of ones. 
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2 Reduction from CLIQUE 


Problem: clique 

Input: Undirected graph G = (U, E); clique size K. 
Question: Does there exist a ET-clique in G? 


The reduction is fairly straightforward. Given the inputs {G, K) for clique, we construct the 
inputs (S,r, M) for SPCA as follows. Let S be the adjacency matrix of G; let r = K; and, let 
M = K — 1. Clearly the reduction is polynomial. It only remains to prove that there is a E-clique 
in G if and only if there is a E-sparse unit vector v for which v^Sv > E — 1. We need the following 
lemma on the spectral radius (top eigenvalue) of an adjacency matrix. 

Lemma 1 ([!]). Let A he the adjacency matrix of a graph H of order i. If H is an i-clique, then 
IIAII 2 = Ai(A) = £ — 1; if H is not an £-clique, then ||A ||2 = Ai(A) < i — 1. 

We now prove the claim. Suppose Q is a E-clique in G and let Sg be the K x K principal 
submatrix of S corresponding to the nodes in Q. Let z be a unit-norm top eigenvector of Sg, 
and let v(z) be the vector with E non-zeros induced by z: the non-zeros in v are at the indices 
corresponding to the nodes in Q and the values are the corresponding values in z. Then, 

v'^Sv = z'^Sgz = Ai(Sg) = E - 1, 

where the last equality follows from Lemma [T] because Sg is the adjacency matrix of a E-clique. 
So, v(z) is a E-sparse unit vector for which v^Sv > E — 1. Now, suppose that there is a unit-norm 
E-sparse v for which v^Sv > E — 1. Let Sg be the E x E principal submatrix of S corresponding 
to the non-zero entries of v and let z(v) be the E-dimensional vector consisting only of the non¬ 
zeros of V. Let Q be the subgraph induced by the nodes corresponding to the non-zero indices of 
V (Sg is the adjacency matrix of Q). Then, v^Sv = z’^Sgz > E — 1, and so Ai(Sg) > E — 1. By 
Lemma [T] if Q is not a E-clique then Ai(Sg) < E — 1, so it follows that Q is a E-clique. Clearly 
SPCA is in NP and so it is NP-complete. 


3 Inapproximability of SPCA 


We now provide evidence that there is no efficient approximation algorithm for SPCA. First we rule 
out the possibility of a fully polynomial time approximation scheme (FPTAS). Given any instance 
(S,r) of SPCA, define OPT(S,r) = maxv v^Sv over unit-norm r-sparse v. A (1 — e)-approximation 
algorithm for SPCA produces a unit-norm r-sparse solution v for any given instance (S, r) satisfying 
v^Sv > (1 — e)OPT(S,r). An FPTAS is algorithm to compute a (1 — e)-approximation for e > 0 
and every instance of SPCA that is polynomial in n, r, e~^. The next theorem establishes that there 
is no polynomial (1 — 0(l/r^))-approximation algorithm and hence no FPTAS. 


Theorem 2 (No FPTAS). Unless P=NP, there is no polynomial time (1 — e)-approximation algo¬ 
rithm for SPCA with 


e < e*{r) = 


r + 1 

2(r - 1) 


1-Wl- 


(r -h 1)" 


- 1 


+ 0(l/r^ 
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Proof. The proof essentially amounts to strengthening Lemma [T] for the case that H is not an 
^-clique. Specifically in Lemma [H if adjacency matrix A E is not the adjacency matrix of an 
^-clique, then we will show that 

Suppose that ([*]) holds whenever H is not an ^-clique. For any SPCA instance (S,r), suppose the 
polynomial algorithm A gives a (1 — e)-approximation with e < e*(r). We show how to use A to 
polynomialy decide clique. Given {G,K), the inputs to clique, use our reduction to construct 
{S,K,K — 1), the inputs to SPCA. Now run algorithm A on {S,K) to obtain v and compute 
X = vSv. If X > (AT — 1)(1 — e*{K)) then OPT(S, AT) = AT — 1 and so there is a AT-clique in G; 
if X < (A' — 1)(1 — e*(Ar)) then OPT(S, AT) < K — 1 (since we have a better than (1 — e*{K))- 
approximation) and so there is no AT-clique in G. 

To prove (jl|), we first consider the adjacency matrix of a complete graph minus one edge, 


A = 


02x2 

_l(£-2)x2 


l2x(£-2) 

l£- 2 l £_2 “ I(£-2)x(£-2) 


By symmetry, the top eigenvector can be written 


Xl2 

y'^t-2 


The eigenvalue equation is 


02x2 l2x(£-2) 

Xl2 

= A 

Xl2 

_l(£-2)x2 l£-2l£_2 “ I(£-2)x(£-2)_ 

yi£-2_ 


_yU-2_ 


and we obtain the equations: 


{i-2)y = Ax; 
2x + {£- 3)y = Xy. 


Solving for A gives the quadratic X'^ — {£ — 3)X — 2{£ — 2) = 0, and the positive root is 

A = — -h - \/{£A 1)^ - 8, 

which is the expression in (Jlj). Since the spectral radius is strictly decreasing with edge-removal, 
we have proved the upper bound in ([*]). ■ 

Under stronger (average-case) complexity assumptions we can also exclude polynomial con¬ 
stant factor approximations for SPCA. A natural optimization version of clique is the densest-A- 
subgraph (dks): Given (G, K) find a subgraph Q on K nodes with the maximum number of edges. 
There is evidence that dkS does not admit efficient approximation algorithms [T]. 

Let G and G' be two graphs on n vertices. Suppose that one of the graphs has an Gclique and 
for the other graph, every subgraph on £ vertices has at most 5£{£ — l)/2 edges for 0 < 5 < 1. If 
one has a polynomial ^-approximation algorithm for dkS then one can determine which of G, G' 
has the Gclique in polynomial time. We show that if one has an a-approximation algorithm for 
SPCA, then one can determine which of G, G' has the Gclique in polynomial time for 6 < a'^. This 
means that if there are no polynomial algorithms to distinguish between graphs with Gcliques and 
graphs whose £ subsets are all below a density then there are no polynomial a-approximation 
algorithms for SPCA. 
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Suppose there is an a-approximation algorithm for SPCA. So, given any instance (S,r) of SPCA, 
in polynomial time one can construct a solution v for which v^Sv > aOPT(S, r). Let G, G' be the 
two graphs described above with 5 = 0 ^. Note that 


6 = 


(^- 1 ) 

£ 



where the inequality is because 0 < a < 1. Now, let A be the adjacency matrix of G and run the 
a-approximation algorithm for SPCA with inputs (A, t) to produce a solution v. If v^Av > 
declare that G contains the ^-clique; otherwise declare that G' contains the ^-clique. We prove that 
our algorithm correctly identifies the graph with the Aclique. 

If G does contain the Aclique, then OPT(A,^) = i — 1 and the output v will satisfy v’^Av > 
a{i — 1) (because it is an a-approximation) and so we will correctly identify G to have Aclique. 
Now suppose that G does not contain the Aclique. So, every Anode subgraph in G has at most 
e < 5i{£ — I)/2 edges. We now use the bound on the spectral radius of a graph with e edges 
from [8]: ||A ||2 < y/2e — n + I, and since e < 6£{£ — l)/2, we have that 


AII 2 < ^/6£{£ -l)-£ + l 
= ^a‘^£{£ -l)-£ + l 

= a{£ — 1 ). 


Since ||A ||2 < a{£ — 1), we will correctly identify G' to have the Aclique. The conclusion is 
summarised in the following theorem. 

Theorem 3. A polynomial a-approximation algorithm for SPCA gives a polynomial algorithm to 
distinguish between two graphs on n vertiees, one of whieh eontains an £-clique and the other with 
every subset of £ nodes having at most o?£{£— l)/2 edges (for any {n,£)). 

Under a variety of complexity assumptions it is known that one cannot efficiently distinguish 
between graphs with Acliques and graphs in which all subsets of size £ are sparse (for varying 
degrees of sparseness). 

Theorem 4 (No constant factor approximation for dkS [T]). Let 1 > d > 0 be any eonstant 
approximation factor. Let G and G' be two graphs on £?' vertiees. One of the graphs has an £-clique 
and for the other graph, every subgraph on £ vertices has at most 6£{£ — l)/2 edges. Suppose there 
is no polynomial time algorithm for solving the hidden clique problem for a planted clique of size 
Then, there is no polynomial algorithm to determine whieh of G,G' has the £-elique. 

Using Theorem [3] with Theorem 01 

Corollary 5 (No constant factor approximation for spca). Suppose there is no polynomial time 
algorithm for solving the hidden elique problem for a planted elique of size nf!'^. Then, for any 
eonstant 0 < a < 1, there is no polynomial time a-approximation algorithm for SPCA. 
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